Combining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists

نویسندگان

  • Manny Rayner
  • David M. Carter
  • Vassilios Digalakis
  • Patti Price
چکیده

A simple and general method is described that can combine different knowledge sources to reorder N-best lists of hypotheses produced by a speech recognizer. The method is automatically trainable, acquiring information from both positive and negative examples. In experiments, the method was tested on a 1000-utterance sample of unseen ATIS data. 1. I N T R O D U C T I O N During the last few years, the previously separate fields of speech and natural language processing have moved much closer together, and it is now common to see integrated systems containing components for both speech recognition and language processing. An immediate problem is the nature of the interface between the two. A popular solution has been the N-best list-for example, [9]; for some N, the speech recognizer hands the language processor the N utterance hypotheses it considers most plausible. The recognizer chooses the hypotheses on the basis of the acoustic information in the input signal and, usually, a simple language model such as a bigram grammar. The language processor brings more sophisticated linguistic knowledge sources to bear, typically some form of syntactic and/or semantic analysis, and uses them to choose the most plausible member of the N-best list. We will call an algorithm that selects a member of the N-best list a preference method. The most common preference method is to select the highest member of the list that receives a valid semantic analysis. We will refer to this as the "highest-incoverage" method. Intuitively, highest-in-coverage seems a promising idea. However, practical experience shows that it is surprisingly hard to use it to extract concrete gains. For example, a recent paper [8] concluded that the highest-incoverage candidate was in terms of the word error rate only very marginally better than the one the recognizer considered best. In view of the considerable computational overhead required to perform linguistic analysis on a large number of speech hypotheses, its worth is dubious. In this paper, we will describe a general strategy for constructing a preference method as a near-optimal combination of a number of different knowledge sources. By a "knowledge source", we will mean any well-defined procedure that associates some potentially meaningful piece of information with a given utterance hypothesis H. Some examples of knowledge sources are • The plausibility score originally assigned to H by the recognizer • The sets of surface unigrams, bigrams and trigrams present in H • Whether or not H receives a well-formed syntactic/semantic analysis • If so, properties of that analysis The methods described here were tested on a 1001-utterance unseen subset of the ATIS corpus; speech recognition was performed using SRI's DECIPHER T M recognizer [7, 5], and linguistic analysis by a version of the Core Language Engine (CLE [2]). For 10-best hypothesis lists, the best method yielded proportional reductions of 13% in the word error rate and 11% in the sentence error rate; if sentence error was scored in the context of the task, the reduction was about 21%. By contrast, the corresponding figures for the highestin-coverage method were a 7% reduction in word error rate, a 5% reduction in sentence error rate (strictly measured), and a 12% reduction in the sentence error rate in the context of the task. The rest of the paper is laid out as follows. In Section 2 we describe a method that allows different knowledge sources to be merged into a near-optimal combination. Section 3 describes the experimental results in more detail. Section 4 concludes. 2. C O M B I N I N G K N O W L E D G E S O U R C E S Different knowledge sources (KSs) can be combined. We begin by assuming the existence of a training corpus of N-best lists produced by the recognizer, e ~ h list tagged with a "reference sentence" that determines which (if any) of the hypotheses in it was correct. We analyse each hypothesis H in the corpus using a set of possible KSs, each of which associates some form of information with H. Information can be of two different kinds. Some KSs may directly produce a number that can be viewed as a measure of H's plausibility. Typical examples are the score the recognizer assigned to H, and the score for whether or not H received a linguistic analysis (1 or 0, respectively). More commonly, however, the KS will produce a list of one or more "linguistic items" associated with H, for example surface N-grams in H or the grammar rules occurring in the best linguistic analysis of H, if there was one. A given linguistic item L is associated with a numerical score through a "discrimination function" (one function for each type of linguistic item), which summarizes the relative frequencies of occurrence of L in correct and incorrect hypotheses, respectively. Discrimination functions are discussed in more detail shortly. The score assigned to H

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Knowledge Sourcesto Reorder N - Best Speech Hypothesis

A simple and general method is described that can combine diierent knowledge sources to reorder N-best lists of hypotheses produced by a speech recognizer. The method is automatically trainable, acquiring information from both positive and negative examples. In experiments, the method was tested on a 1000-utterance sample of unseen ATIS data.

متن کامل

Combining Confidence Scores with Contextual Features for Robust Multi-Device Dialogue

We present an approach to multi-device dialogue that evaluates and selects amongst candidate dialogue moves based on features at multiple levels. Multiple sources of information can be combined, multiple speech recognition and parsing hypotheses tested, and multiple devices and moves considered to choose the highest scoring hypothesis overall. The approach has the added benefit of potentially r...

متن کامل

Is N-Best Dead?

We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of still more powerful knowledge sources, and for several other purposes that are outlined in the pap...

متن کامل

Informations morpho-syntaxiques et adaptation thématique pour améliorer la reconnaissance de la parole

A way to improve outputs produced by automatic speech recognition (ASR) systems isto integrate additional linguistic knowledge. Our research in this eld focuses on two aspects:morpho-syntactic information and thematic adaptation.In the rst part, we propose a new mode of integration of parts of speech in a post-processingstage of speech decoding. To do this, we tag N-best sentenc...

متن کامل

A Challenging Issue in the Etiology of Speech Problems: The Effect of Maternal Exposure to Electromagnetic Fields on Speech Problems in the Offspring

Background: Nowadays, mothers are continuously exposed to different sources of electromagnetic fields before and even during pregnancy.  It has recently been shown that exposure to mobile phone radiation during pregnancy may lead to adverse effects on the brain development in offspring and cause hyperactivity. Researchers have shown that behavioral problems in laboratory animals which have a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/cmp-lg/9407010  شماره 

صفحات  -

تاریخ انتشار 1994